This tutorial shows how to explore intra-tumor heterogeneity within cutaneous malignant melanoma with PANDA. The data required for this tutorial can be downloaded from Zenodo.

Deconvolution

We first show how to estimate cell type proportions and cell-type-specific gene expression for cutaneous malignant melanoma spatial transcriptomics data via PANDA.

The spatial transcriptomics dataset and the scRNA-seq reference dataset to be used can be downloaded from Zenodo, and they are stored in the data folder.

sc_counts_path <- "./data/sc_data/sc_counts.csv"
sc_labels_path <- "./data/sc_data/sc_labels.csv"
st_counts_path <- "./data/st_data/st_counts.csv"

PANDA comprises two key steps: archetypal analysis on the scRNA-seq reference and deconvolution of spatial transcriptomics data.

Loading package

library(PANDA)

Archetypal analysis on the scRNA-seq reference

In the first step, PANDA employs archetypal analysis on the scRNA-seq reference data to infer cell-type-specific archetypes, providing a comprehensive characterization of the state space for each cell type.

sc_counts <- read.csv(sc_counts_path, header = TRUE, row.names = 1)
sc_labels <- read.csv(sc_labels_path, header = TRUE, row.names = 1)
names_sc_labels <- rownames(sc_labels)
sc_labels <- sc_labels$celltype
names(sc_labels) <- names_sc_labels

sc_results <- sc_train(sc_counts, sc_labels, n_archetypes_vec = 10, tol = 1e-8, maxiter = 2000, do_initialize = TRUE, do_plot = TRUE, n_hvgs = 2000, n_markers = 20, n_sample_cells = 500, do_parallel = TRUE, n_cores = 6, save_res = TRUE, save_dir = "./PANDA_results/sc_results")

Deconvolution of spatial transcriptomics data

In the second step, PANDA performs deconvolution on the spatial transcriptomics data to deliver accurate estimates of cell type composition and cell-type-specific gene expression.

st_counts <- read.csv(st_counts_path, header = TRUE, row.names = 1)

st_results <- st_train(st_counts, sc_results = sc_results, n_hvgs = 5000, sigma = 0.3, tol = 1e-8, maxiter = 2000, save_res = TRUE, save_dir = "./PANDA_results/st_results")

Extracting results

We can extract cell type proportions and cell-type-specific gene expression for each spot in the spatial transcriptomics data as follows.

proportion <- st_results$proportion
expression <- st_results$mu

Analysis

Then, we explore intra-tumor heterogeneity within cutaneous malignant melanoma with the cell type proportions and cell-type-specific gene expression estimated by PANDA.

In order to facilitate the following analysis, the sc_results and st_results obtained by us can also be downloaded from Zenodo, and they are stored in the PANDA_results folder. Load them as follows.

sc_results <- readRDS("./PANDA_results/sc_results/sc_results.rds")
st_results <- readRDS("./PANDA_results/st_results/st_results.rds")

We first specify where the analysis results are stored.

output_dir <- "./analysis"

Some additional functions need to be loaded to support analysis.

source("./code/utils.R")
source("./code/plot.R")

The spatial location information of the spatial transcriptomics data is stored in the data folder, and can be loaded as follows.

st_location_path <- "./data/st_data/st_location.csv"
st_location <- read.csv(st_location_path, header = TRUE, row.names = 1)
colnames(st_location) <- c("y", "x")

For ease of subsequent analysis, we divided the melanoma tissue into lymphoid-rich, melanoma-rich and stromal-rich regions based on the pathological annotation. The region information of the spots is stored in the data folder, and can be loaded as follows.

region_labels_path <- "./data/st_data/region_labels.csv"
region_labels <- read.csv(region_labels_path, header = TRUE, row.names = 1)
names_region_labels <- rownames(region_labels)
region_labels <- region_labels$region_labels
names(region_labels) <- names_region_labels

lymphoid_spots <- names(region_labels)[region_labels == "Lymphoid"]
melanoma_spots <- names(region_labels)[region_labels == "Melanoma"]

We can plot the region labels as follows. The generated figure will be stored in the specified output_dir folder in pdf format.

plot_scatter_label(as.data.frame(region_labels), st_location, img = NULL, use_color = c("Others" = "grey90", "Lymphoid" = "#F8766D", "Melanoma" = "#00BFC4", "Stromal" = "#7081DA"), fname = "region_labels", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 1, point_size = 5, width = 5.8, height = 5)

Cell type proportion

With the estimated cell type composition, we investigate the spatial distribution of cell types within the melanoma tissue.

The spatial scatter pie plot of the cell type compostions inferred by PANDA can be visualized as follows.

proportion <- st_results$proportion
unique_labels <- sort(colnames(proportion))
use_color <- colorRampPalette(RColorBrewer::brewer.pal(9, "Set1"))(length(unique_labels))
names(use_color) <- unique_labels

plot_scatterpie(proportion, st_location, img = NULL, use_color = use_color, fname = "proportion_scatterpie", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, pie_r = 0.5, width = 7, height = 7)

The scaled proportions of all cell types can be visualized as follows.

plot_scatter_heatmap(proportion, st_location, img = NULL, feature = unique_labels, scale = TRUE, fname = "scaled_proportion_heatmap", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 4, point_size = 3, width = 12, height = 8, legend_position = "bottom")

For each cell type, the proportions across all spots are scaled using min-max normalization such that the minimum value is transformed to 0, the maximum value is transformed to 1, and all other values are scaled to decimals between 0 and 1.

In order to validate the spatial distribution of cell types, we visualize the spatial expression pattern of cell-type-specific metagenes. The metagene for each cell type is defined as the mean of the top five marker genes identified by Seurat based on the scRNA-seq reference.

marker_genes <- find_markers(t(sc_counts[, intersect(colnames(sc_counts), colnames(st_counts))]), sc_labels, n_markers = NULL)
marker_genes <- marker_genes[unique_labels]

st_counts_norm <- st_counts / rowSums(st_counts)
st_counts_norm_metagene <- sapply(marker_genes, function(x){log2(rowMeans(st_counts_norm[, x[1:5]]) * 1e4 + 1)})
plot_scatter_heatmap(st_counts_norm_metagene, st_location, img = NULL, feature = unique_labels, scale = TRUE, fname = "scaled_metagene_exp_heatmap", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 4, point_size = 3, width = 12, height = 8, legend_position = "bottom")

Additionally, we assess the enrichment or depletion of cell types relative to the lymphoid-rich, melanoma-rich, and stromal-rich regions. The specific procedures are as follows. We first calculate the average cell type proportion of all spots in each region, termed as the true average. Next, we permute the indices of spots corresponding to the cell type proportions. In other words, we randomly reassign the cell type proportion vectors to each spot. We calculate the average cell type proportion of all spots in each region after permutation, termed as the permuted average. We repeat the permutation 10,000 times and obtain the set of permuted averages. Then, the differences between the true average and all the permuted averages are computed. Finally, the enrichment score for each region is defined as the mean of these differences divided by the standard deviation of these differences.

enrich_res <- enrichment_celltype(proportion, region_labels, n_shuffle = 10000)

plot_df <- enrich_res$diff_mean[c("Lymphoid", "Melanoma", "Stromal"),]
plot_df <- as.data.frame(cbind(rownames(plot_df), plot_df))
colnames(plot_df) <- c("Region", colnames(enrich_res$diff_mean))
plot_df <- reshape2::melt(plot_df, id.vars = c("Region"), variable.name = "Label", value.name = "value")
plot_df$Region <- factor(plot_df$Region, levels = c("Lymphoid", "Melanoma", "Stromal"))
plot_df$Label <- factor(plot_df$Label, levels = unique_labels)

fname <- "enrichment_celltype_region"
pdf(paste(output_dir, paste(fname, "pdf", sep = "."), sep = "/"), width = 4.5, height = 4)
fig <- ggplot() +
        geom_point(data = plot_df, aes(x = Region, y = Label, size = abs(as.numeric(value)), color = as.character(sign(as.numeric(value))))) + 
        theme_bw() + 
        scale_size(range = c(1, 8)) +
        scale_color_manual(values = c("-1" = "#D13854", "1" = "#43BF55")) +
        scale_y_discrete(limits = rev(levels(plot_df$Label))) +
        labs(title = fname, color = "Direction", size = "Effect size")
print(fig)
dev.off()

Green represents enrichment, and red represents depletion. The marker size is proportional to the score of enrichment or depletion.

Cell-type-specific gene expression

Using inferred cell-type-specific gene expression, we explore transcriptional differences within the same cell type. We first visualize the cell-type-specific gene expression estimated by PANDA. We map the estimated spot-level cell-type-specific gene expression profiles to the UMAP constructed by the single cells of the corresponding cell type from the scRNA-seq reference. Note that for each cell type, we only focus on the cell-type-specific gene expression of spots containing the proportion of that cell type greater than 0.1. In addition, we focus on the genes used for archetypal analysis, which include highly variable genes and cell-type-specific differentially expressed genes.

mu_list <- st_results$mu
keep_genes <- colnames(sc_results$archetypes_list[[1]])
mu_list <- lapply(mu_list, function(x){x[, keep_genes]})

mu_reference_res <- plot_mu_reference(mu_list, proportion, sc_counts, sc_labels, archetypes_list = sc_results$archetypes_list, min_prop = 0.1, f_name = "mu_reference", save_dir = output_dir, return_res = TRUE, n_cols = 4, width = 12, height = 6)

The gray circles represent the single cells from the scRNA-seq reference and the blue triangles represent the inferred cell-type-specific archetypes. The red circles represent the spots and the intensity of red color depends on the proportion of the corresponding cell type contained in the spot.

Spatial heterogeneity of B cells

We explore the spatial heterogeneity of B cells between the lymphoid-rich and melanoma-rich regions, aiming to elucidate transcriptional variances in B cells located at varying distances from the tumor. Note that we only focus on the spots containing the proportion of B cells greater than 0.1.

keep_spots <- which(proportion[, "B"] > 0.1)
B_mu <- mu_list[["B"]][keep_spots,]

The spatial scatter pie plot of the combination coefficients of B cell-specific archetypes for these spots can be visualized as follows.

B_alpha <- st_results$alpha[keep_spots, grep("B", colnames(st_results$alpha))]

unique_alpha <- colnames(B_alpha)
use_color_alpha <- colorRampPalette(RColorBrewer::brewer.pal(9, "Set1"))(length(unique_alpha))
names(use_color_alpha) <- unique_alpha

plot_scatterpie(B_alpha, st_location, img = NULL, use_color = use_color_alpha, fname = "B_alpha_scatterpie", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, pie_r = 0.45, point_size = 6.3, width = 7, height = 7)

The combination coefficients for each archetype across the spots can be visualized as follows.

plot_scatter_heatmap(B_alpha, st_location, img = NULL, feature = colnames(B_alpha), scale = FALSE, fname = "B_alpha_heatmap", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 5, point_size = 2.5, width = 12, height = 6, legend_position = "bottom")

We identify the differentially expressed genes between the B cells in the lymphoid-rich region and the B cells in the melanoma-rich region.

B_lymphoid_spots <- intersect(rownames(B_mu), lymphoid_spots)
B_melanoma_spots <- intersect(rownames(B_mu), melanoma_spots)
marker_genes_B <- find_markers(t(B_mu[c(B_lymphoid_spots, B_melanoma_spots), ] * 1e4), c(rep("Lymphoid", length(B_lymphoid_spots)), rep("Melanoma", length(B_melanoma_spots))), n_markers = NULL)

The scaled average expression of marker genes in the lymphoid-rich region and in the melanoma-rich region can be visualized as follows.

B_mu_markers_exp <- sapply(marker_genes_B, function(x){log2(rowMeans(B_mu[c(B_lymphoid_spots, B_melanoma_spots), x, drop = FALSE]) * 1e4 + 1)})

plot_scatter_heatmap(B_mu_markers_exp, st_location, img = NULL, feature = NULL, scale = TRUE, fname = "scaled_B_mu_markers_heatmap", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 2, point_size = 2.1, width = 5, height = 4)

We perform the enrichment analysis on these marker genes to explore the different roles of B cells in different regions within the melanoma tissue.

enrichGO_B_lymphoid <- plot_enrichGO(marker_genes_B[["Lymphoid"]], n_category = 12, fname = "enrichGO_B_lymphoid", save_dir = output_dir, width = 5, height = 6, return_results = TRUE)
enrichGO_B_melanoma <- plot_enrichGO(marker_genes_B[["Melanoma"]], n_category = 12, fname = "enrichGO_B_melanoma", save_dir = output_dir, width = 5, height = 6, return_results = TRUE)

Spatially variable genes

We explore genes within malignant cells that varied with the proportion of immune cells within the melanoma-rich region. Here, immune cells encompass B cells, macrophages, NK cells, and T cells. Our focus is on spots containing a proportion of malignant cells greater than 0.1 in the melanoma-rich region.

keep_spots <- intersect(rownames(proportion)[which(proportion[, "Malignant"] > 0.1)], melanoma_spots)
Malignant_mu <- mu_list[["Malignant"]][keep_spots,]

test_genes <- colnames(Malignant_mu)[colSums(Malignant_mu) != 0]
test_proportion <- rowSums(proportion[keep_spots, c("B", "T", "Macro", "NK")])

The spatial scatter pie plot of the compositions of immune cells, malignant cells and other cells at each spot in the melanoma-rich region can be visualized as follows.

temp_prop <- cbind(proportion[keep_spots, "Malignant"], test_proportion, rowSums(proportion[keep_spots, c("CAF", "Endo")]))
colnames(temp_prop) <- c("Malignant", "Immune", "Stromal")
rownames(temp_prop) <- keep_spots
use_color_temp <- c("#F8766D", "#00BFC4", "grey90")
names(use_color_temp) <- c("Immune", "Malignant", "Stromal")

plot_scatterpie(temp_prop, st_location, img = NULL, use_color = use_color_temp, fname = "Malignant_Immune_proportion_scatterpie", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, pie_r = 0.5, point_size = 6.75, width = 7, height = 7)

The scaled proportions of immune cells in the melanoma-rich region can be visualized as follows.

plot_scatter_heatmap(temp_prop[, "Immune", drop = FALSE], st_location, img = NULL, feature = "Immune", scale = TRUE, fname = "Malignant_scaled_Immune_proportion_heatmap", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 1, point_size = 2.2, width = 3, height = 3, legend_position = "right")

To characterize the relationship between the expression of each gene and the proportion of immune cells, we use the Pearson correlation coefficient and correlation test. Additionally, we adjust p-values using the Benjamini-Hochberg procedure.

test_cor_pvalue <- c()
for (i in seq(length(test_genes))){
    temp <- cor.test(Malignant_mu[, test_genes[i]], test_proportion, method = "pearson")
    test_cor_pvalue <- rbind(test_cor_pvalue, c(temp$estimate, temp$p.value))
}
test_cor_pvalue <- cbind(test_cor_pvalue, p.adjust(test_cor_pvalue[,2], method = "BH"))
rownames(test_cor_pvalue) <- test_genes
colnames(test_cor_pvalue) <- c("correlation", "p_value", "p_adjust")

test_cor_pvalue <- test_cor_pvalue[order(test_cor_pvalue[,1], decreasing = TRUE),]

plot_df <- data.frame(x = seq(nrow(test_cor_pvalue)), y = rev(test_cor_pvalue[, 1]), text = rev(rownames(test_cor_pvalue)), logp = rev(- log10(test_cor_pvalue[, 3])))

f_name <- "Malignant_Immune_test_cor_pvalue"
pdf(paste(output_dir, paste(f_name, "pdf", sep = "."), sep = "/"), width = 4, height = 4.5)
fig <- ggplot() +
        geom_point(data = plot_df, aes(x = x, y = y, color = logp), size = 1, shape = 21, stroke = 0.5) +
        geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
        labs(title = "Malignant_Immune_test_cor_pvalue", x = "Index", y = "Correlation") +
        scale_color_gradientn(colors = c("#5E4FA2", "#3E96B6", "#99D5A4", "#FDFEBD", "#FDDB88", "#F67948", "#9E0142")) +
        ggprism::theme_prism(base_size = 10, base_fontface = "plain")
print(fig)
dev.off()

Genes with adjusted p-values less than 0.05 and correlation coefficients greater than 0 are considered significantly positively correlated with the proportion of immune cells, forming the significantly positively correlated gene group. Genes with adjusted p-values less than 0.05 and correlation coefficients less than 0 form the significantly negatively correlated gene group.

sig_positive_genes <- rownames(test_cor_pvalue)[(test_cor_pvalue[, 3] < 0.05) & (test_cor_pvalue[, 1] > 0)]
sig_negative_genes <- rev(rownames(test_cor_pvalue)[(test_cor_pvalue[, 3] < 0.05) & (test_cor_pvalue[, 1] < 0)])

The scaled average expression patterns of the two identified gene groups can be visualized as follows.

Malignant_mu_Immune_gene <- sapply(list(sig_positive_genes, sig_negative_genes), function(x){log2(rowMeans(Malignant_mu[, x, drop = FALSE]) * 1e4 + 1)})
colnames(Malignant_mu_Immune_gene) <- c("positive", "negative")

plot_scatter_heatmap(Malignant_mu_Immune_gene, st_location, img = NULL, feature = c("positive", "negative"), scale = TRUE, fname = "scaled_Malignant_mu_Immune_gene_exp_heatmap", save_dir = output_dir, reverse_x = FALSE, reverse_y = FALSE, n_cols = 2, point_size = 2.1, width = 5, height = 4, legend_position = "right")

We perform the enrichment analysis to uncovered diverse biological functions associated with each gene group.

enrichGO_Malignant_Immune_positive <- plot_enrichGO(sig_positive_genes, n_category = 12, object = "human", fname = "enrichGO_Malignant_Immune_positive", save_dir = output_dir, width = 5, height = 6, return_results = TRUE)
enrichGO_Malignant_Immune_negative <- plot_enrichGO(sig_negative_genes, n_category = 12, object = "human", fname = "enrichGO_Malignant_Immune_negative", save_dir = output_dir, width = 5, height = 6, return_results = TRUE)